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ABSTRACT 

OriDB (http://www.oridb.org/) is a database con- 
taining collated genome-wide mapping studies of 
confirmed and predicted replication origin sites. 
The original database collated and curated 
Saccharomyces cerevisiae origin mapping studies. 
Here, we report that the OriDB database and web 
site have been revamped to improve user accessi- 
bility to curated data sets, to greatly increase the 
number of curated origin mapping studies, and to 
include the collation of replication origin sites in 
the fission yeast Schizosaccharomyces pombe. 
The revised database structure underlies these im- 
provements and will facilitate further expansion in 
the future. The updated OriDB for S. cerevisiae is 
available at http://cerevisiae.oridb.org/ and for 
S. pombe at http://pombe.oridb.org/. 

INTRODUCTION 

Complete, accurate replication of the genome is crucial for 
life. Chromosomes must be precisely copied exactly once, 
a process that takes place during S phase. To complete 
DNA replication within S phase, replication of eukaryotic 
genomes is initiated at multiple discrete chromosomal sites 
called replication origins. Appropriate distribution of the 
origin sites is important to ensure that every sequence is 
replicated. However, not every origin site is used in every 
cell cycle; that is replication origins differ in their effi- 
ciency. Furthermore, origins activate at characteristic 
times during S phase, with some origins activating early 
in S phase and others later. 

Replication origins are best characterized in the 
budding yeast Saccharomyces cerevisiae and the fission 
yeast Schizosaccharomyces pombe. In both organisms, 
origin sequences have been isolated through their ability 
to support plasmid replication (called Autonomously 
Replicating Sequences or ARS) (1,2). Chromosomal 



origin activity has been assayed using two-dimensional 
(2D) gel electrophoresis to detect replication intermediates 
in both 5. cerevisiae and S. pombe (3,4). Saccharomyces 
cerevisiae origins contain an essential sequence element 
called the ARS consensus sequence (ACS) (5). In 
contrast, 5. pombe origins feature AT-rich sequences, 
but no specific sequence motif (6). Origin sites in both 
yeasts are bound by the Origin Recognition Complex 
(ORC), which in turn recruits Cdc6 and Cdtl to load 
Mcm2-7 double hexamers and form a pre-replication 
complex (pre-RC). Assembly of the pre-RC 'licenses' the 
origin for activation in the subsequent S phase. 

Saccharomyces cerevisiae ORC binds to the ACS, 
however the ACS alone is not sufficient for origin 
function. Indeed, there are approximately 12 000 
matches to the ACS in the genome, but only approxi- 
mately 500 of these are functional replication origins. 
Consequently, there must be additional mechanisms to 
specify replication origin sites. These are thought to 
include transcription that ablates origin function (7,8), 
chromatin structure that can aid ORC recruitment (9,10) 
and secondary sequence motifs (11,12). The 5. pombe 
Orc4 protein contains AT-hook domains that recognize 
and bind AT-rich origin sequences (13). The high AT 
content of 5. pombe replication origins has allowed their 
identification, genome wide, as AT-rich islands (14). 

Genome-wide approaches to identify and characterize 
replication origin locations rely on detecting either the 
origin-associated proteins or the DNA synthesis at 
active origin sites. Chromatin-immunoprecipitation 
(ChIP) of ORC and/or MCM proteins have been used 
to isolate origin sites (15-17). In 5. cerevisiae, this has 
been combined with motif searches or phylogenetic foot- 
printing to predict the location of the ACS (5,9,18). Active 
replication origins have been identified as local points of the 
earliest replicating sequence in genome-wide measures of 
when each sequence in the yeast genome replicates (19-22). 
Origin sites have also been identified as sites of BrdU in- 
corporation or accumulation of single-stranded DNA 
when cells are challenged with hydroxyurea (16,23,24). 
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Previously, we collated the proposed location of 
5. cerevisiae origin sites from the available genome-wide 
mapping studies and presented the results in a 
web-accessible database, OriDB (25). This collated data 
set has facilitated comparisons with a range of other 
chromosomal features including transcription (26), 
genomic rearrangements (27) and fragile sites (28,29). 
Furthermore, the comprehensive origin data sets and the 
underlying data have permitted mathematical approaches 
to investigate genome replication (30-32). Now we present 
a major update to OriDB. We have completely 
restructured the underlying database tables to enable the 
incorporation of many additional data sets, improvements 
in user access to the raw data and expansion to a second 
model system, S. pombe. 

RESULTS 

Revised database structure 

The original release of OriDB implemented a simple, but 
limited table structure. The majority of data was stored in 
a single table. This has made updating the database time 
consuming and has risked the introduction of errors. The 
rapid growth in replication origin studies necessitated a 
complete restructuring of the underlying database. 

We have replaced the original OriDB database tables 
with a large number of non-redundant tables with defined 
relationships (Figure 1). Four primary tables define the 
database and the relationship between all the tables. 



First, the table 'sc_ori' contains the list of confirmed, 
likely and dubious origin sites collated from published 
studies as described previously (25). Second, the table 
'sc_ori_studies' lists the studies that have published lists 
of origin locations. Third, the table 'sc_repl_data' lists the 
studies for which OriDB has stored the experimental data 
from which origin predictions have been made. Fourth, 
the table 'sc_elements_studies' lists the studies that have 
proposed origin sequence elements. Each of these primary 
tables defines the relationship with further tables that store 
the data from each of these studies. These tables of pub- 
lished data are from genome-wide studies and are supple- 
mented with additional tables that have been collated by 
OriDB from the literature: origin sites confirmed by 2D 
gel electrophoresis (sc_2D_gel), origin sites confirmed by 
ARS assay (cloned_ori) and confirmed origin sequence 
elements (sc_confirmed_ACS). 

All collated data sets and chromosomal coordinates are 
presented relative to the 1 October 2003 release of the S. 
cerevisiae genome (referred to as sacCerl at the UCSC 
genome browser) (33,34). To convert between the 
various sequence releases, we have used the liftOver tool 
from UCSC (35) with custom generated parameter 
(over. chain) files. Members of the yeast community can 
use this tool through a web interface at: http://www 
.nieduszynski.org/liftover/. 

The restructuring of the OriDB database tables 
necessitated a complete re-writing of the web pages. The 
resulting changes offer a number of significant benefits for 
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Figure 1. Four primary tables define the database structure for S. cerevisiae OriDB. (Left-hand side) The table 'sc_ori_studies' describes the curated 
studies that have reported replication origin sites, each of which is represented by a further table. (Middle) The table 'sc_repl_data' describes 
additional tables that contain the experimental data from origin mapping studies. (Right-hand side) The table 'sc_elements_studies' describes the 
curated studies that reported sequence elements at replication origins; each of these studies is represented by a table. (Bottom) The table 'sc_ori' 
contains the collated list of all reported replication origin sites from those studies listed in 'sc_ori_studies\ Finally, each table is linked to the 
appropriate PubMed record in a locally stored table ('local_pubmed') that retrieves data directly from PubMed. 
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users. The origin details pages now load all tabs concur- 
rently, but only display the user-selected tab; this allows 
rapid switching between tabs. Furthermore, the new data 
structures allow improved user access to the underlying 
data, making it straightforward to include many addition- 
al origin mapping data sets and allow for the expansion of 
OriDB to include the fission yeast 5. pombe. These pages 
are available at http://cerevisiae.oridb.org/ and http:// 
pombe.oridb.org/ (with backup sites available at http:// 
www.nottingham.ac.uk/plzcnlab/oridb/cerevisiae/ and at 
http://www.nottingham.ac.uk/plzcnlab/oridb/pombe/). 

Improved user access to data 

The most frequent user request is to retrieve data from an 
OriDB curated study in a user-specified format. The new 
database structure, described above, allows us to imple- 
ment a straightforward yet powerful route to the 
underlying data. A 'download' link present in the top 
bar of every OriDB page allows access to all the datasets 
curated at OriDB, including those tables curated from the 
literature. Data sets are grouped first by the data type (e.g. 
predictions of origin location) and then by the original 
study. Access to the underlying data is also available 
from links on the pages that summarize the findings of 
individual studies. The user has a choice of appropriate 
formats for downloading the data (including the raw data 
in a tab or comma separated format, BED or WIG 
formats for display in genome browsers, and FASTA for 
sequence download). These pages and links are generated 
from the underlying database tables and therefore will 
automatically update to include new studies, as they are 
included in OriDB. 

Expanded data coverage 

The original OriDB database collated four genome-wide 
(microarray) data sets (15,19,20,23) and our phylogenetic 
footprinting of origin sequence elements (5). The availabil- 
ity of high-resolution microarrays (18) and more recently, 
deep-sequencing technologies (9) have led to a large 
increase in the number of studies proposing origin loca- 
tions. The new database structure has allowed us to inte- 
grate many additional data sets, so that at the time of 
writing, S. cerevisiae OriDB includes 10 genome-wide 
data sets and has the capability to include an effectively 
unlimited number in the future. The data from these 
studies are presented to the user through the details page 
for each origin. As in the previous version of OriDB, the 
details page includes an 'Origin Location Assignments' 
tab which lists all the studies that identify the particular 
origin [as described previously this is based upon the 
proposed resolution of the study in question (25)]. The 
'Origin Location Assignments' tab also has the capability 
to display additional information from each study for 
each origin location. For example, a recent study 
mapped the activity of origins in different mutant cells 
subjected to the drug hydroxyurea (24); OriDB includes 
the details of which mutants the origin was reported to be 
active in. 



Collation of S. pombe replication origin sites 

The mapping of replication origin sites in S. pombe has 
drawn on a similar range of experimental techniques as 
used in S. cerevisiae, including ARS assays and 2D gels. 
Although 5. pombe replication origins do not contain a 
discrete sequence motif for ORC recruitment, the replica- 
tion origins have a characteristic AT composition, called 
AT islands. The computational identification of AT 
islands allowed the accurate predication of replication 
origin sites throughout the 5. pombe genome (14). More 
recently, genome-wide studies have employed microarray 
technologies to identify origins based upon the location of 
pre-RC proteins, newly synthesized DNA (16), the 
increase in DNA copy number as a sequence replicates 
(21) or the single-stranded DNA that accumulates at 
stalled replication forks (23). Each of these studies 
produced a genome-wide list of replication origin sites. 
To facilitate access to these data sets and allow compari- 
son between them, we generated a single collated list of 
replication origin sites presented through a web-accessible 
database, which includes text and graphic representations 
of the data (Figure 2). The independent studies that 
identified S. pombe replication origins have used a range 
of naming conventions that have resulted in different 
names being assigned to the same origin. To consolidate 
replication origin naming in S. pombe, we have assigned 
each S. pombe replication origin site a systematic name 
based upon the chromosome number (in roman 
numerals) and the chromosomal coordinate. Hence the 
origin on chromosome 1 at 3060 kb is named ori-I-3060 
[other names for this origin are arslll9 (16), oril095 (21), 
AT1098 (14) and ars766 (1)]. Our collated S. pombe rep- 
lication origin data is presented relative to the current 
genome sequence, downloaded on 1 October 2011 (36). 
The S. pombe replication origin database can be 
accessed at http://pombe.oridb.org/. 

DISCUSSION 

In the era of high-throughput genome-wide data gener- 
ation, it is essential that the scientific community can 
access the data and the conclusions drawn from these 
data. For replication origin mapping studies, this means 
access to microarray (or deep sequencing) data and the 
inferred origin locations. OriDB aims to provide access 
to exactly these data types, presenting them through a 
user-friendly interface. In this update, we improve user 
access to the underlying data (now available for 
download), extend the number of studies collated and 
for the first time collate origin sites from 5". pombe. 
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Figure 2. Screen shot from S. pombe OriDB showing the Origin Summary Graphic tab for ori-I-3060. A window of the S. pombe genome is 
shown centred upon the origin of interest. (Top) the gene structure is shown ('mouse over' displays the name of the each gene). (Main plot) 
Vertical bars show the replication origin sites (black for confirmed; dark grey for likely; light grey for dubious). Blue and green bars illustrate the 
location of signal from ChIP of Mcm6 and Orel, respectively (16). The red curve gives the increase in DNA content during DNA replication in 
the presence of hydroxyurea (21). The blue curve shows the accumulation of single-stranded DNA during DNA replication in Acdsl cells exposed to 
hydroxyurea (23). 
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